probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If

A

is the hypothesis, and

B

and

C

are observations, conditional independence can be stated as an equality: :

P(A\mid B,C) = P(A \mid C)

where

P(A \mid B, C)

is the probability of

A

given both

B

and

C

. Since the probability of

A

given

C

is the same as the probability of

A

given both

B

and

C

, this equality expresses that

B

contributes nothing to the certainty of

A

. In this case,

A

and

B

are said to be conditionally independent given

C

, written symbolically as:

(A \perp\!\!\!\perp B \mid  C)

. The concept of conditional independence is essential to graph-based theories of statistical inference, as it establishes a mathematical relation between a collection of conditional statements and a

graphoid A graphoid is a set of statements of the form, "''X'' is irrelevant to ''Y'' given that we know ''Z''" where ''X'', ''Y'' and ''Z'' are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, ...

Conditional independence of events

Let

A

B

, and

C

be events.

A

and

B

are said to be conditionally independent given

C

if and only if

P(C) > 0

and: :

P(A \mid B, C) = P(A \mid C)

This property is often written:

(A \perp\!\!\!\perp B \mid C)

, which should be read

((A \perp\!\!\!\perp B) \vert C)

. Equivalently, conditional independence may be stated as: :

P(A,B, C) = P(A, C)P(B, C)

where

P(A,B, C)

is the

joint probability A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGra ...

A

and

B

given

C

. This alternate formulation states that

A

and

B

are

independent events Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of ...

, given

C

. It demonstrates that

(A \perp\!\!\!\perp B \mid C)

is equivalent to

(B \perp\!\!\!\perp A \mid C)

Proof of the equivalent definition

P(A, B \mid C) = P(A\mid C)P(B\mid C)

:iff

\frac = \left(\frac\right) \left(\frac \right)

(definition of

) :iff

P(A, B, C) = \frac

(multiply both sides by

P(C)

) :iff

\frac= \frac

(divide both sides by

P(B, C)

) :iff

P(A \mid B, C) = P(A \mid C)

(definition of conditional probability)

\therefore

Examples

Coloured boxes

Each cell represents a possible outcome. The events

\colorR

\colorB

and

\colorY

are represented by the areas shaded , and respectively. The overlap between the events

\colorR

and

\colorB

is shaded . The probabilities of these events are shaded areas with respect to the total area. In both examples

\colorR

and

\colorB

are conditionally independent given

\colorY

because: :

\Pr(,  \mid ) =  \Pr( \mid )\Pr( \mid )

but not conditionally independent given

\left \text\right /math> because:

: \Pr(,  \mid \text ) \not= \Pr( \mid \text )\Pr( \mid \text )

Proximity and delays

Let events A and B be defined as the probability that person A and person B will be home in time for dinner where both people are randomly sampled from the entire world. Events A and B can be assumed to be independent i.e. knowledge that A is late has minimal to no change on the probability that B will be late. However, if a third event is introduced, person A and person B live in the same neighborhood, the two events are now considered not conditionally independent. Traffic conditions and weather-related events that might delay person A, might delay person B as well. Given the third event and knowledge that person A was late, the probability that person B will be late does meaningfully change.Could someone explain conditional independence?
/ref>

Dice rolling

Conditional independence depends on the nature of the third event. If you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of one die will not tell you about the result of the second die. (That is, the two dice are independent.) If, however, the 1st die's result is a 3, and someone tells you about a third event - that the sum of the two results is even - then this extra unit of information restricts the options for the 2nd result to an odd number. In other words, two events can be independent, but NOT conditionally independent.

Height and vocabulary

Height and vocabulary are dependent since very small people tend to be children, known for their more basic vocabularies. But knowing that two people are 19 years old (i.e., conditional on age) there is no reason to think that one person's vocabulary is larger if we are told that they are taller.

Conditional independence of random variables

Two discrete

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

X

and

Y

are conditionally independent given a third discrete random variable

Z

if and only if they are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

in their

conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...

given

Z

. That is,

X

and

Y

are conditionally independent given

Z

if and only if, given any value of

Z

, the probability distribution of

X

is the same for all values of

Y

and the probability distribution of

Y

is the same for all values of

X

. Formally: where

F_(x,y)=\Pr(X \leq x, Y \leq y \mid Z=z)

is the conditional

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

X

and

Y

given

Z

. Two events

R

and

B

are conditionally independent given a σ-algebra

\Sigma

if :

\Pr(R, B \mid \Sigma) = \Pr(R \mid \Sigma)\Pr(B \mid \Sigma) \text

where

\Pr(A \mid \Sigma)

denotes the

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on ...

of the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

of the event

A

\chi_A

, given the sigma algebra

\Sigma

. That is, :

\Pr(A \mid \Sigma) := \operatorname chi_A\mid\Sigma

Two random variables

X

and

Y

are conditionally independent given a σ-algebra

\Sigma

if the above equation holds for all

R

\sigma(X)

and

B

\sigma(Y)

. Two random variables

X

and

Y

are conditionally independent given a random variable

W

if they are independent given ''σ''(''W''): the σ-algebra generated by

W

. This is commonly written: :

X \perp\!\!\!\perp Y \mid W

or :

X \perp Y \mid W

This is read "

X

is independent of

Y

, given

W

"; the conditioning applies to the whole statement: "(

X

is independent of

Y

) given

W

". :

(X \perp\!\!\!\perp Y) \mid W

This notation extends

X \perp\!\!\!\perp Y

for "

X

Y

." If

W

assumes a countable set of values, this is equivalent to the conditional independence of ''X'' and ''Y'' for the events of the form

=w /math>.
Conditional independence of more than two events, or of more than two random variables, is defined analogously.

The following two examples show that X \perp\!\!\!\perp Y''neither implies nor is implied by'' (X \perp\!\!\!\perp Y) \mid W .

First, suppose W is 0 with probability 0.5 and 1 otherwise.  When ''W'' = 0 take X and Y to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise.  When W=1, X and Y are again independent, but this time they take the value 1 with probability 0.99.  Then (X \perp\!\!\!\perp Y) \mid W . But X and Y are dependent, because Pr(''X'' = 0) < Pr(''X'' = 0, ''Y'' = 0).  This is because Pr(''X'' = 0) = 0.5, but if ''Y'' = 0 then it's very likely that ''W'' = 0 and thus that ''X'' = 0 as well, so Pr(''X'' = 0, ''Y'' = 0) > 0.5.

For the second example, suppose X \perp\!\!\!\perp Y, each taking the values 0 and 1 with probability 0.5. Let W be the product X \cdot Y .  Then when W=0, Pr(''X'' = 0) = 2/3, but Pr(''X'' = 0, ''Y'' = 0) = 1/2, so (X \perp\!\!\!\perp Y) \mid W is false.
This is also an example of Explaining Away. See Kevin Murphy's tutorial  where X and Y take the values "brainy" and "sporty".

Conditional independence of random vectors

Two

random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

\mathbf=(X_1,\ldots,X_l)^

and

\mathbf=(Y_1,\ldots,Y_m)^

are conditionally independent given a third random vector

\mathbf=(Z_1,\ldots,Z_n)^

if and only if they are independent in their conditional cumulative distribution given

\mathbf

. Formally: where

\mathbf=(x_1,\ldots,x_l)^

\mathbf=(y_1,\ldots,y_m)^

and

\mathbf=(z_1,\ldots,z_n)^

and the conditional cumulative distributions are defined as follows. :

F_(\mathbf) &= \Pr(Y_1 \leq y_1,\ldots,Y_m \leq y_m \mid Z_1=z_1,\ldots,Z_n=z_n) \end

Uses in Bayesian inference

Let ''p'' be the proportion of voters who will vote "yes" in an upcoming

referendum A referendum, plebiscite, or ballot measure is a Direct democracy, direct vote by the Constituency, electorate (rather than their Representative democracy, representatives) on a proposal, law, or political issue. A referendum may be either bin ...

. In taking an

opinion poll An opinion poll, often simply referred to as a survey or a poll, is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of qu ...

, one chooses ''n'' voters randomly from the population. For ''i'' = 1, ..., ''n'', let ''X''_''i'' = 1 or 0 corresponding, respectively, to whether or not the ''i''th chosen voter will or will not vote "yes". In a

frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...

approach to

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...

one would not attribute any probability distribution to ''p'' (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that ''X''₁, ..., ''X''_''n'' are

random variables. By contrast, in a Bayesian approach to statistical inference, one would assign a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

to ''p'' regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that ''p'' is in any interval to which a probability is assigned. In that model, the random variables ''X''₁, ..., ''X''_''n'' are ''not'' independent, but they are conditionally independent given the value of ''p''. In particular, if a large number of the ''X''s are observed to be equal to 1, that would imply a high

, given that observation, that ''p'' is near 1, and thus a high

, given that observation, that the ''next'' ''X'' to be observed will be equal to 1.

Rules of conditional independence

A set of rules governing statements of conditional independence have been derived from the basic definition.J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press These rules were termed "

Graphoid A graphoid is a set of statements of the form, "''X'' is irrelevant to ''Y'' given that we know ''Z''" where ''X'', ''Y'' and ''Z'' are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, ...

Axioms" by Pearl and Paz, because they hold in graphs, where

X \perp\!\!\!\perp A\mid B

is interpreted to mean: "All paths from ''X'' to ''A'' are intercepted by the set ''B''".

Symmetry

X \perp\!\!\!\perp Y \mid Z \quad
\Leftrightarrow
\quad
Y \perp\!\!\!\perp X \mid Z

Proof: From the definition of conditional independence, :

X \perp\!\!\!\perp Y \mid Z \quad
\Leftrightarrow
\quad P(X, Y \mid Z) = P(X \mid Z) P(Y \mid Z) \quad
\Leftrightarrow
\quad
Y \perp\!\!\!\perp X \mid Z

Decomposition

X \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
h(X) \perp\!\!\!\perp Y \mid Z

Proof From the definition of conditional independence, we seek to show that: :

X \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
P(h(X), Y \mid Z) = P(h(X) \mid Z) P(Y \mid Z)

. The left side of this equality is: :

P(h(X)=a, Y=y \mid Z=z) = \sum_ P(X=x, Y=y \mid Z=z)

, where the expression on the right side of this equality is the summation over

X

such that

h(X)=a

of the conditional probability of

X, Y

Z

. Further decomposing, :

\begin
\sum_ P(X=x, Y=y \mid Z=z) =& \sum_ P(X=x \mid Z=z) P(Y=y \mid Z=z) \\
=& P(Y=y \mid Z=z) \sum_ P(X=x \mid Z=z) \\
=& P(Y \mid Z) P (h(X) \mid Z)
\end

. Special cases of this property include *

(X, W) \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid Z

** Proof: Let us define

A = (X,W)

and

h(\cdot)

be an 'extraction' function

h(X,W) = X

. Then: :

\begin
(X,W) \perp\!\!\!\perp Y \mid Z
\quad &\Leftrightarrow \quad
A \perp\!\!\!\perp Y \mid Z \\
&\Rightarrow \quad
h(A) \perp\!\!\!\perp Y \mid Z \quad &\text \\
&\Leftrightarrow \quad
X \perp\!\!\!\perp Y \mid Z
\end

X \perp\!\!\!\perp (Y, W) \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid Z

** Proof: Let us define

V = (Y,W)

and

h(\cdot)

be again an 'extraction' function

h(Y,W) = Y

. Then: :

\begin
X \perp\!\!\!\perp (Y,W) \mid Z
\quad &\Leftrightarrow \quad 
X \perp\!\!\!\perp V \mid Z \\
&\Leftrightarrow \quad
V \perp\!\!\!\perp X \mid Z \quad &\text \\
&\Rightarrow \quad
h(V) \perp\!\!\!\perp X \mid Z \quad &\text \\
&\Leftrightarrow \quad
Y \perp\!\!\!\perp X \mid Z \\
&\Leftrightarrow \quad
X \perp\!\!\!\perp Y \mid Z \quad &\text
\end

Weak union

X \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid (Z, h(X))

Proof: Given

X \perp\!\!\!\perp Y \mid Z

, we aim to show :

\begin
X \perp\!\!\!\perp Y \mid (Z, h(X))
\quad &\Leftrightarrow \quad
X \perp\!\!\!\perp Y \mid U \quad &\text \quad U = (Z, h(X)) \\
&\Leftrightarrow \quad
Y \perp\!\!\!\perp X \mid U \quad &\text \\
&\Leftrightarrow \quad
P(Y\mid X, U) = P(Y\mid U) \\
&\Leftrightarrow \quad
P(Y \mid X, Z, h(X)) = P(Y \mid Z, h(X))
\end

. We begin with the left side of the equation :

\begin
P(Y \mid X, Z, h(X)) &= P(Y \mid X, Z) \\
&= P(Y \mid Z) &\text Y \perp\!\!\!\perp X \mid Z
\end

. From the given condition :

\begin
X \perp\!\!\!\perp Y \mid Z
\quad &\Rightarrow \quad
h(X) \perp\!\!\!\perp Y \mid Z
\quad &\text \\
&\Leftrightarrow \quad
Y \perp\!\!\!\perp h(X) \mid Z
\quad &\text \\
&\Rightarrow \quad
P(Y \mid Z, h(X)) = P(Y \mid Z)
\end

. Thus

P(Y \mid X, Z, h(X)) = P(Y \mid Z, h(X))

, so we have shown that

X \perp\!\!\!\perp Y \mid (Z, h(X))

. Special Cases: Some textbooks present the property as *

X \perp\!\!\!\perp (Y, W) \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid (Z, W)

. *

(X,W)  \perp\!\!\!\perp Y \mid Z
\quad \Rightarrow \quad
X \perp\!\!\!\perp Y \mid (Z,W)

. Both versions can be shown to follow from the weak union property given initially via the same method as in the decomposition section above.

Contraction

\left.\begin
  X \perp\!\!\!\perp A \mid B \\
  X \perp\!\!\!\perp B
\end\right\}\text
\quad \Rightarrow \quad
X \perp\!\!\!\perp A,B

Proof This property can be proved by noticing

\Pr(X\mid A,B) = \Pr(X\mid B) = \Pr(X)

, each equality of which is asserted by

X \perp\!\!\!\perp A \mid B

and

X \perp\!\!\!\perp B

, respectively.

Intersection

For strictly positive probability distributions, the following also holds: :

\left.\begin
  X \perp\!\!\!\perp Y \mid Z, W\\
  X \perp\!\!\!\perp W \mid Z, Y
\end\right\}\text
\quad \Rightarrow \quad
X \perp\!\!\!\perp W, Y \mid Z

Proof By assumption: :

P(X, Z, W, Y) = P(X, Z, W) \land P(X, Z, W, Y) = P(X, Z, Y) \implies P(X, Z, Y) = P(X, Z, W)

Using this equality, together with the

Law of total probability In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct ev ...

applied to

P(X, Z)

: :

&= P(X, Z, Y) \end

Since

P(X, Z, W, Y) = P(X, Z, Y)

and

P(X, Z, Y) = P(X, Z)

, it follows that

P(X, Z, W, Y) = P(X, Z) \iff X \perp\!\!\!\perp Y,W ,  Z

. Technical note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say ''K''. For example,

X \perp\!\!\!\perp Y \Rightarrow Y \perp\!\!\!\perp X

would also mean that

X \perp\!\!\!\perp Y \mid K  \Rightarrow Y \perp\!\!\!\perp X \mid K

References

External links

* {{DEFAULTSORT:Conditional Independence Independence (probability theory)

Conditional independence of events

Proof of the equivalent definition

Examples

Coloured boxes

Proximity and delays

Dice rolling

Height and vocabulary

Conditional independence of random variables

Conditional independence of random vectors

Uses in Bayesian inference

Rules of conditional independence

Symmetry

Decomposition

Weak union

Contraction

Intersection

See also

References

External links